Because of Python's flexibility, it is not only easy to do network analysis using NetworkX, but with a minimal amount of code we can simulate dynamics on networks.
In this section we'll simulate disease dynamics on NetworkX graphs. This last section is optional, you can choose to work on it, or to try to start doing your own work using NetworkX. I'll be available to help.
In this section you will implement and describe SIR dynamics on a network. You should implement the dynamics as follows:
Note here that we are assuming that nodes are infected for exactly one time step before they move into a recovered state, i.e. $\gamma=1$. Below is the outline of the function
In [4]:
import networkx as nx
In [ ]:
import numpy as np
In [5]:
def simulate_SIR(G,beta,n0=None):
if n0 is None:
n0 = [random.choice(G.nodes())]
infected = [set(n0)]
recovered = set([])
t = 1
while True:
#
return infected,recovered
There are three graphs stored in the data folder that correspond to three different real world networks. They are stored as edgelists so use the appropriate function to read them
celegans.g: A metabolic network of the nematode
Caenorhabditis elegans[1]jazz.g: A network of collaborations among Jazz
Musicians.[2]ISPDec2014.g: A network of connections between ISPs, taken
during the last week of December 2014.[3][1] Duch, Jordi, and Alex Arenas. "Community detection in complex networks using extremal optimization." Physical review E 72.2 (2005): 027104.
[2] Gleiser, Pablo M., and Leon Danon. "Community structure in jazz." Advances in complex systems 6.04 (2003): 565-573.
[3] Edwards, Benjamin, et al. "Analyzing and Modeling Longitudinal Security Data: Promise and Pitfalls." Proceedings of the 31st Annual Computer Security Applications Conference. ACM, 2015.
For each graph, simulate the SIR dynamics starting with a single node and record the fraction of nodes in the network infected over time for three different values of $\beta$. Because the SIR dynamics are stochastic, you will have to simulate each infection multiple times. Plot the average of these runs over time as well as the standard errors as error bars for each value of $\beta$.
Next investigate you will investigate the behavior of infection spread for various values of $\beta$. Select at least 20 different values of $\beta$. Simulate the SIR dynamics on the network starting with a random node, measuring the total proportion of the network that becomes infected. Be sure to simulate the infection enough times that you can reasonably estimate mean and standard deviation of each of these measures (at least 100). For each measure, make a plot of the $\beta$ values and the measure for each of the three networks. Include the mean and error bars for the standard deviation. Report on what each measure tells us about the three different networks.
It might be important to know which nodes in the network are most capable of spreading disease. This may be important to identify the best way to stop the spread of infections or the best way to spread information in a network. We will measure how influential a node is by measuring the average proportion of the network which becomes infected when the infection starts with that node. Using $\beta=0.2$ measure the mean infection size when started from each node in each network. Once again be run at least 100 simulations for each node. In a table, report the most and least influential nodes, and the average size of the infection they create.
Rather than run simulations it may be useful to use other network measures to identify influential spreaders in the network. Using the software package of your choice compute each of the following measures for each node in the network:
For each network, make a scatter plot where each point is each the $(x,y)$ pair, where $x$ is each of the above measures for a single node, and $y$ is the average infection size when the infection starts at that node.
Which of the measures provides the best prediction of infection size? Why does each perform well or poorly? Investigate other network measures and speculate why they might be better at identifying influential spreaders. Be sure to cite your sources.
In [ ]: